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Abstract 

Three  exciting  new  methods  that  address  the  accurate  prediction  of  processes 
and  properties  of  large  molecular  systems  are  discussed.  The  systematic  fragmentation 
method  (SEM)  and  the  fragment  molecular  orbital  (EMO)  method  both  decompose  a 
large  molecular  system  (e.g.,  protein,  liquid,  zeolite)  into  small  subunits  (fragments)  in 
very  different  ways  that  are  both  designed  to  retain  the  high  accuracy  of  the  chosen 
quantum  mechanical  level  of  theory  while  greatly  reducing  the  demands  on 
computational  time  and  resources.  Both  of  these  methods  are  inherently  scalable  and  are 
therefore  eminently  capable  of  taking  advantage  of  massively  parallel  computer 
hardware.  The  effective  fragment  potential  (EEP)  method  is  a  very  sophisticated 
approach  for  the  prediction  on  non-binded  and  intermolecular  interactions.  Therefore,  the 
EEP  method  provides  a  way  to  further  reduce  the  computational  effort  while  retaining 
accuracy,  by  treating  the  far  field  interactions  in  place  of  the  full  electronic  structure 
method.  The  performance  of  the  methods  is  demonstrated  using  applications  to  several 
systems,  including  benzene  dimer,  small  organic  species,  pieces  of  the  alpha  helix,  water, 
and  ionic  liquids. 
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1.  Introduction 


The  development  of  quantum  chemistry  methods  in  the  1980s  and  1990s 
primarily  focused  on  performing  very  accurate  calculations  on  relatively  small  molecular 
systems.  The  desire  for  accurate  calculations  on  larger  molecular  species  led  to  several 
formulations  employing  more  efficient  scaling,  as  well  as  additivity  of  basis  set 
improvement  and  higher  levels  of  electron  correlation.  With  regard  to  the  latter,  the 
Gaussian  G(n)  methods  and  the  Weizmann  W(n)  methods  are  well  known,  along  with 
several  variants.^ 

Simultaneous  progress  in  the  development  of  systematically  improving  atomic 
basis  sets  also  provided  a  path  toward  systematic  increases  in  accuracy.  It  was 
recognized"^  that  basis  functions  optimized  for  atomic  correlation  are  also  capable  of 
describing  molecular  correlation  effects.  Dunning  and  co-workers,  for  example, 
introduced  a  series  of  correlation  consistent  basis  set  sets^  based  upon  these  conclusions, 
capable  of  accurately  treating  electron  correlation  with  a  compact  set  of  primitive 
Gaussian  functions.  These  basis  sets  can  be  used  in  a  systematic  way  to  obtain  results 
approaching  the  complete  basis  set  (CBS)  limit.  However,  increasingly  large  basis  sets 
must  be  used,  and  the  convergence  tends  to  be  slow.  Werner  has  recently  introduced  a 
series  of  F12  basis  sets^  with  improved  convergence  to  the  CBS  limit.  The  high  accuracy 
of  these  basis  sets  still  comes  at  a  significant  computational  cost,  only  feasible  on 
relatively  small  systems. 

Chemical  phenomena  occur  in  condensed  phases  as  well  as  in  the  gas  phase,  and 
many  methods  have  been  developed  to  treat  the  chemical  environment^  and  condensed 
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phase  phenomena.  The  desire  to  study  ever  larger  systems  led  to  combining  quantum 
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mechanics  (QM)  with  molecular  mechanics  (MM).  Several  such  combinations,  known  as 
QM/MM  methods,^  have  been  developed  since  the  initial  work  of  Warshel,^^  including 
multi-layer  methods  such  as  ONIOM,^'^  the  Truhlar  MCMM  methods,'^  and  the  effective 
fragment  potential  method  (EFP)  '  developed  by  Gordon  and  co-workers.  The  EFP 
method  will  be  discussed  in  detail  as  a  means  to  investigate  non-bonded  and 
intermolecular  interactions  via  the  automatic  generation  of  a  model  potential  that  is 
derived  from  first  principles.  While  hybrid  methods  have  expanded  the  size  of  systems 
that  are  accessible  to  computations,  the  use  of  classical  model  potentials  for  the 
description  of  the  environment  can  be  a  limiting  factor,  given  that  the  electron  density  of 
the  MM  region  and  its  impact  on  the  QM  region  is  not  usually  properly  accounted  for. 

Alternative  approaches  to  QM/MM  methods  are  fragmentation  methods,  in  which 
the  system  is  broken  (“fragmented”)  into  smaller  pieces,  each  of  which  is  considered 
essentially  independently  by  a  specified  level  of  electronic  structure  theory. 
Fragmentation  methods  have  the  advantage  that  they  are  fully  quantum  mechanical  in 
nature.  Several  general  fragmentation  methods  have  been  proposed,  including  molecular 
fragmentation  with  conjugated  caps  (MFCC),  the  elongation  method,  the  molecular 
tailoring  approach  (MTA),  the  fast  electron  correlation  method  for  molecular  clusters 
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developed  by  Hirata,  Truhlar's  electrostatically  embedded  many-body  (EE-MB) 
expansion,  multi-centered  QM/QM  methods,  the  systematic  fragmentation  method 
(SFM),^"^'^^  and  the  fragment  molecular  orbital  (FMO)  method.^^'"^^  The  latter  two 
methods,  the  SFM  and  the  FMO  methods,  will  be  discussed  in  detail  in  this  work. 

Instead  of  separating  a  system  into  two  regions  that  are  described  by  two  very 
different  levels  of  theory  (QM  and  MM),  fragmentation  methods  that  divide  a  system  into 
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many  smaller  pieces,  all  of  which  are  described  by  the  same  level  of  QM  theory,  have 
been  proposed  since  the  1970s."^^  By  approaching  a  large  system  in  this  way,  each  smaller 
fragment  can  be  treated  using  high  levels  of  theory,  providing  the  desired  accuracy  and 
an  improvement  in  speed.  The  earliest  attempts"^^  constructed  a  set  of  fragments  from 
common  chemical  groups  (methyl,  amino  etc.)  and  used  a  selection  of  these  fragments  to 
build  larger  molecules.  More  recent  fragmentation  methods  '  begin  with  the  larger 
molecule  of  interest  and  break  the  system  into  smaller  fragments. 

Fragmentation  methods  must  also  treat  the  environment  (e.g.,  the  remainder  of  the 
entire  molecular  system,  or  a  solvent)  around  each  fragment  in  some  approximate,  but 
realistic  manner.  When  fragmenting  a  molecule  or  a  molecular  system,  each  fragment  no 
longer  electronically  “feels”  the  remainder  of  the  initial  system,  unless  one  devises  some 
way  to  retain  the  lost  interactions.  This  issue  is  addressed  in  the  FMO  method  by 
performing  each  individual  fragment  calculation  in  a  Coulomb  “bath”  represented  by  the 
electrostatic  potential  (ESP)  of  the  entire  system.  Further  corrections  to  the  FMO  method 
are  achieved  by  performing  fully  quantum  mechanical  two-fragment  (dimer)  and  three- 
fragment  (trimer)  calculations.  In  the  SFM  method^"^  the  effects  of  other  fragments  are 
incorporated  by  including  overlapping  fragments  in  such  a  manner  that  the  double 
counting  of  atoms  is  accounted  for,  and  non-bonded  interactions  are  captured  by 
employing  classical  potentials.  Accurately  capturing  non-bonded  effects  is  essential  to 
maintaining  kcal/mol  accuracy  compared  to  full  ab  initio  studies. 

Traditional  electronic  structure  methods,  such  as  Hartree-Fock  (HF),  second  order 
perturbation  theory  (MP2),  and  coupled  cluster  theory  (e.g.  CCSD(T))  have  rapidly 
increasing  resource  requirements  (e.g.  time,  memory,  mass  storage).  For  example,  the 
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HF,  MP2  and  CCSD(T)  computer  time  requirements  scale  as  0(n"^),  O(n^)  and  O(n^) 
respeetively,  where  n  measures  the  size  of  the  system,  e.g.,  in  terms  of  the  basis  set  size. 
Further,  CCSD(T)  memory  requirements  scale  as  0(n"^),  while  disk  requirements  are 
diffieult  to  uniquely  define.  One  approaeh  to  addressing  the  eomputational  scaling  issue 
is  to  develop  highly  parallel  algorithms.  The  development  of  parallel  algorithms  for 
eleetronic  strueture  theory  has  been  an  active  researeh  area  for  -'20  years,  and 
considerable  progress  has  been  achieved  for  increasingly  eomplex  QM  methods. Such 
efforts  may  be  referred  to  as  fine-grained  parallelism,  in  the  sense  that  eaeh  energy  or 
derivative  evaluation  itself  takes  advantage  of  many  eores,  usually  in  a  distributed 
manner.  In  many  fragmentation  methods  eaeh  fragment  calculation  can  be  performed 
essentially  independently  of  all  the  others.  This  leads  to  a  multi-level  parallelism,  sinee 
the  energy  of  eaeh  fragment  ean  be  obtained  on  a  separate  node  (eoarse-grained 
parallelism),  while  the  fine-grained  parallelism  ean  be  exploited  within  eaeh  node."^^  If  a 
fragmentation  method  is  implemented  to  take  advantage  of  this  ability,  large  reduetions 
in  required  computational  resources  ean  be  achieved,  faeilitating  calculations  on,  for 
example,  condensed  phases,  proteins  and  surfaces.  Fragmentation  approaehes  with 
multi-level  parallelism  also  expand  the  capabilities  of  modest  (e.g.,  single  group  or 
departmental)  computer  systems. 

The  present  work  focuses  on  three  methods  that  have  been  designed  to  aceurately 
treat  large  systems:  EFP,  SFM,  and  FMO.  Benzene  dimer  is  ehosen  as  a  representative 
example  to  illustrate  the  aecuraey  and  efficiency  of  the  EFP  method,  although  several 
sueh  studies  have  been  earned  out,^'^  as  have  EFP  moleeular  dynamies  simulations.^^  The 
EFP  approaeh  has  also  been  included  as  a  means  to  eapture  aceurate  non-bonded 
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interactions  within  the  SFM  framework,  as  a  replacement  for  simple  electrostatics.  It 
will  be  illustrated  that  the  FMO  method  can  be  used  to  accurately  describe  a  series  of 
water  clusters  and  ionic  liquid  systems. 

2.  Effective  Fragment  Potential  (EFP)  Method 

The  generalized  effective  fragment  potential  (EFP2)  method  is  a  first-principles 
based  model  potential  for  the  evaluation  of  intermolecular  forces.  This  is  a  modification 
and  extension  of  the  original  EFPl  water  model'^’'^  to  general  systems.  There  are  five 
EEP-EEP  interaction  terms  in  the  EPP2  model  potential,  along  with  damping  functions, 
each  of  which  may  be  thought  of  as  a  truncated  expansion: 

Coulombic  (electrostatic),  induction  (polarization),  exchange  repulsion,  dispersion  (Van 
der  Waals)  ,  and  charge  transfer. 

E  ~  Ecoul  Eind  Eexrep  Edisp  Ect  (1) 

In  EEPI,  the  exchange  repulsion,  Eexrep,  and  charge  transfer,  Ed,  components  are  folded 
into  one  term  that  contains  fitted  parameters,  and  there  is  no  dispersion  contribution. 

EEPI  has  been  integrated  with  HE,^^  DET,^"^  MCSCE,'^  singly  excited  configuration 

interaction  (CIS),^^  and  time-dependent  density  functional  theory  (TDDET).'^  The  EEP2- 
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QM  interface  is  still  under  development. 

The  five  terms  in  the  EEP  potential  may  be  grouped  into  long  range,  (1/R)"  distance 
dependent,  or  short  range  interactions,  which  decay  exponentially.  The  Coulombic, 
induction,  and  dispersion  are  long-range  interactions,  whereas  the  exchange  repulsion  and 
charge  transfer  are  short  range.  EEP  has  been  described  in  detail  in  several  papers, 
therefore  only  a  brief  overview  of  the  terms  will  be  presented  below. 

The  Coulomb  portion  of  the  electrostatic  interaction,  Ecoui,  is  obtained  using  the 
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Stone  distributed  multipolar  analysis/^  This  expansion  is  truncated  at  the  octopole  term. 
Atom  centers  and  the  bond  midpoints  are  used  as  expansion  points. 

Induction  (polarization),  Bind,  arises  from  the  interaction  of  an  induced  dipole  on 
one  fragment  with  the  permanent  dipole  on  another  fragment,  expressed  in  terms  of  the 
dipole  polarizability.  Truncating  at  the  first  (dipole)  term  in  the  polarizability  expansion 
is  viable,  since  the  molecular  polarizability  tensor  is  expressed  as  a  tensor  sum  of 
localized  molecular  orbital  (LMO)  polarizabilities.  Therefore,  the  number  of  bonds  and 
lone  pairs  in  the  system  is  equal  to  the  number  of  polarizability  points.  This  induction 
term  is  iterated  to  self-consistency,  so  it  is  able  to  capture  some  many-body  effects. 

Because  the  Coulomb  and  induction  terms  discussed  above,  as  well  as  the 
dispersion  interaction,  are  treated  by  classical  approximations,  the  shorter  range 
interactions  that  occur  when  quantum  mechanical  charge  densities  begin  to  overlap  are 
not  correctly  captured.  Therefore,  each  term  is  multiplied  by  a  damping  (screening) 
expression.  The  relative  merits  of  several  approaches  to  damping  have  recently  been 
analyzed  and  discussed  extensively.  Classical  Coulombic  interactions  become  too 
repulsive  at  short  range,  and  must  be  moderated  by  a  screening  term,  as  discussed  in 
several  previous  papers.  ’  Conversely,  the  induction  interaction  becomes  too  attractive 
in  the  short-range  regime,  so  a  damping  term  is  needed  here  as  well.  The  unphysical 
behavior  is  avoided  by  augmenting  the  electrostatic  multipoles  with  exponential  damping 
functions  of  the  form: 

/damp  =  1  -  exp(-a^?)  (2) 

where  parameters  a  are  determined  at  each  multipole  expansion  point  by  fitting  the 
multipole  damped  potential  to  reproduce  the  Hartree-Fock  potential.  Damping  terms  in 


7 


the  electrostatic  energy  are  derived  explicitly  from  the  damped  potential  and  the  charge 

density.  The  damping  procedure  can  be  extended  to  higher-order  electrostatic  terms;  that 
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is,  the  charge-  dipole,  dipole-dipole,  etc.,  interactions,  and  this  is  recommended. 
Damping  is  also  applied  to  the  induction  and  dispersion  energies.  ’  For  induction,  both 
exponential  damping,  as  in  Eq.  (2)  and  Gaussian  damping  are  effective,  but  the  Gaussian 
damping  seems  to  be  more  generally  applicable  and  is  therefore  recommended. 

The  exchange  repulsion  interaction  between  two  fragments  is  derived  as  an 
expansion  in  the  intermolecular  overlap.  When  this  overlap  expansion  is  expressed  in 
terms  of  frozen  LMOs  on  each  fragment,  the  expansion  can  reliably  be  truncated  at  the 
quadratic  term.  This  term  does  require  each  EFP  to  carry  a  basis  set.  Since  the  same  basis 
set  is  used  to  generate  the  multipoles  and  the  molecular  polarizability  tensor,  EFP 
calculations  are  basis  set  dependent.  The  smallest  recommended  basis  set  is  6- 
3 1++G(d,p)  The  dependence  of  the  computational  cost  of  an  EFP  calculation  on  the 
basis  set  appears  primarily  in  the  initial  generation  of  the  EFP.  Therefore,  one  can  employ 
much  larger  basis  sets  with  minimal  cost.  The  tests  presented  below  on  the  SFM  method 
use  the  6-3 1 1++G(3df,2p)  basis  set.  Since  the  basis  set  is  used  only  to  calculate  overlap 
integrals,  the  computation  is  very  fast  and  quite  large  basis  sets  are  realistic. 

Dispersion  interactions  are  often  expressed  by  an  inverse  R  expansion, 

E,,,  =  ’ZC.R-  (3) 

n 

where  the  coefficients  Cn  may  be  derived  from  the  (imaginary)  frequency  dependent 
polarizabilities  integrated  over  the  entire  frequency  range.  The  first  term  in  the 
expansion,  n=6,  corresponds  to  the  induced  dipole-induced  dipole  (Van  der  Waals) 
interactions.  In  the  EFP2  method,  this  term  is  evaluated  using  the  time-dependent  HF 
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method.  In  addition  the  contribution  of  the  n=8  term  is  estimated  empirically.  The  Ce 
coefficients  are  derived  in  terms  of  interactions  between  pairs  of  LMOs,  one  each  on  the 
two  interacting  fragments.  Because  the  dispersion  interaction  should  decrease  to  zero  at 
short  range,  each  dispersion  term  is  multiplied  by  a  damping  function.  Tang-Toennies 
damping  is  frequently  used  to  damp  dispersion.  However,  a  new  approach  that  is  based 
on  the  overlap  integrals  between  interacting  fragments  is  free  of  fitted  parameters  and 
appears  to  be  generally  applicable.  In  future  EFP  applications,  the  overlap-based 
dispersion  damping  is  recommended. 

The  charge  transfer  interaction  is  derived  using  a  supermolecule  approach,  in  which 
the  occupied  valence  molecular  orbitals  on  one  fragment  are  allowed  to  interact  with  the 
virtual  orbitals  on  another  fragment.  This  interaction  term  leads  to  significant  energy 
lowering  in  ab  initio  calculations  on  ionic  or  highly  polar  species  when  incomplete  basis 
sets  are  employed.  An  approximate  formula  for  the  charge  transfer  interaction  in  the 
EFP2  method  was  derived  and  implemented  using  a  second  order  perturbative  treatment 
of  the  intermolecular  interactions  for  a  pair  of  molecules  at  the  Hartree-Fock  level  of 
theory.  This  approximate  formula  is  expressed  in  terms  of  the  canonical  orbitals  from  a 
Hartree-Fock  calculation  of  the  isolated  molecules  and  uses  a  multipolar  expansion 
(through  quadrupoles)  of  the  molecular  electrostatic  potentials.  Orthonormality  is 
enforced  between  the  virtual  orbitals  of  the  other  molecule  and  all  of  the  orbitals  of  the 
considered  molecule,  so  that  the  charge  transfer  is  not  contaminated  with  induction.  This 
approximate  formula  has  been  implemented  in  the  EFP  method  and  gives  charge  transfer 
energies  comparable  to  those  obtained  directly  from  Hartree  Fock  calculations.  The 
analytic  gradients  of  the  charge  transfer  energy  were  also  derived  and  implemented. 
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enabling  efficient  geometry  optimization. 

It  is  useful  to  consider  the  relative  costs  of  the  five  EFP  interaction  terms.  Based  on 
relatively  small  molecules  and  taking  the  cost  of  the  Coulomb  and  dispersion  terms  to  be 
one  unit,  the  induction  interaction  would  cost  approximately  two  units,  exchange 
repulsion  would  cost  about  five  units,  and  charge  transfer  would  cost  ~10  units.  For 
larger  molecules,  the  relative  costs  of  exchange  repulsion  and  charge  transfer  will 
decrease  since  they  will  scale  linearly  in  the  large  molecule  limit.  As  always  in 
computational  chemistry  there  is  a  trade  off  between  computational  cost  and  accuracy. 

While  the  FFP  model  is  currently  a  rigid  body  model  potential,  analytic  gradients 
for  all  terms  have  been  derived  and  implemented,  so  full  intermolecular  geometry 
optimizations,  Monte  Carlo  and  molecular  dynamics  simulations^®’^'  can  be  performed. 
Because  the  method  involves  no  empirically  fitted  parameters,  an  EFP  for  any  system  can 
be  generated  by  a  “makefp”  run  in  the  GAMESS^"'  suite  of  programs.  This  automatic 
generation  makes  possible  the  use  of  the  EFP  method  for  treating  intermolecular  and  non- 
bonded  interactions  in  fragmentation  methods  such  as  the  SFM. 

2.1  The  EFP  Method  as  a  Model  for  Non-Bonded  Interactions 

Benzene  dimer  is  used  here  to  illustrate  the  accuracy  of  EFP -EFP  non-bonded 
interactions,  with  a  focus  on  the  n  -n  interactions  between  two  benzene  rings.  These  n  -n 
interactions  are  largely  driven  by  dispersion  and  are  therefore  difficult  to  account  for 
accurately  by  most  ab  initio  electronic  structure  methods.  Previous  theoretical  and 
experimental  studies  suggest  that  there  are  two  minima  on  the  benzene  dimer  potential 
energy  surface  '  the  perpendicular  T-shaped  and  parallel-slipped  configurations,  as 
shown  in  Figure  1.  A  sandwich  structure  with  two  parallel  benzene  rings,  also  shown  in 
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Figure  1,  is  a  saddle  point  that  eonneets  two  equivalent  parallel  slipped  struetures. 

Sherrill  and  eo-workers  ealeulated  potential  energy  curves  for  these  three  structures^^’^^ 
using  second  order  Moller-Plesset  perturbation  theory  (MP2)  and  coupled-cluster  theory 
with  single,  double  and  perturbative  triple  excitations  [CCSD(T)]  A  variety  of 

o 

augmented  correlation  consistent  basis  sets  were  used  with  both  the  MP2  and  CCSD(T) 
levels  of  theory.  Additionally,  they  employed  symmetry-adapted  perturbation  theory 
(SAPT)^"^  to  decompose  the  benzene  %  -n  interaction  energy  into  electrostatic,  dispersion, 
induction  and  exchange-repulsion  components  of  the  total  interaction  energy.  The 
binding  energies,  equilibrium  separations  and  SAPT  energy  decomposition  results  from 
their  work  compare  well  with  similar  results  obtained  using  the  EFP  method,  as 
illustrated  below. 

For  this  work,  the  EFP  for  benzene  was  constructed  with  the  6-31 1++G(3df,2p) 

basis  set,^^  using  the  MP2/aug-cc-pVTZ*"*  benzene  monomer  geometry  taken  from  Ref. 

57.  The  multipoles  for  benzene  were  generated  using  a  numerical  distributed  multipolar 

analysis  (DMA).^^  The  numerical  DMA  scheme  is  employed  due  to  the  instability  and 

basis  set  dependence  of  the  standard  analytic  DMA  scheme,  as  well  as  the  need  for 

diffuse  functions  to  properly  describe  the  exchange-repulsion  interactions  within  the  EFP 

framework.  Higher  order  (up  to  quadrupoles)  damping  terms  were  also  used  to  provide 

2 1 

an  accurate  description  of  charge  penetration  through  screening  of  the  potentials. 

The  EFP  binding  energies  and  corresponding  inter-ring  distances  for  the  three 
benzene  dimer  structures  are  in  good  agreement  with  the  analogous  ab  initio  values 
obtained  by  Sherrill  and  co-workers  (See  Table  1).  Relative  to  the  full  CCSD(T)/aug-cc- 
pVQZ  binding  energies,  the  EFP  method  over-binds  the  sandwich  dimer  by  0.4  kcal/mol 
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and  under-binds  the  T-shaped  structure  by  0.1  kcal/mol,  while  the  equilibrium 
intermolecular  separations  are  overestimated  by  approximately  0. 1-0.2  A.  In 
comparison,  MP2  with  the  same  basis  set  overestimates  the  binding  energies  by  1.7 
kcal/mol  and  0.9  kcal/mol  for  the  sandwich  and  T-shaped  dimers,  respectively,  and 
underestimates  the  equilibrium  distances  by  approximately  0. 1-0.2  A.  In  fact,  the  MP2 
binding  energies  become  successively  worse  compared  with  those  predicted  by  CCSD(T) 
as  the  basis  set  is  improved.  The  EFP  and  CCSD(T)  predicted  binding  energies  and 
structures  are  in  reasonable  agreement  with  each  other,  whereas  the  agreement  between 
MP2  and  CCSD(T)  is  not  as  good.  Table  1  summarizes  the  MP2,  CCSD(T)  and  EFP  total 
interaction  energies  of  all  three  benzene  dimer  structures.  A  comparison  of  the  total 
interaction  energy  decompositions  obtained  using  both  SAPT  and  the  EFP  method  shows 
good  agreement  for  the  sandwich  and  T-shaped  isomers  (See  Figures  2  and  3). 
Specifically,  the  error  in  the  EFP  method  compared  to  SAPT  for  the  dispersion, 
exchange-repulsion  and  polarization  interactions  is  in  the  range  0.2-0. 5  kcal/mol  for  these 
two  isomers.^^^ 

Highly  accurate  methods  involve  very  demanding  scaling  of  computational 
resources,  such  as  time,  memory  and  disk.  For  instance,  a  single  point  energy  calculation 
in  the  6-31 1++G(3df,  2p)  basis  set  (660  basis  functions)  by  MP2  requires  142  minutes  of 
CPU  time  on  one  IBM  Power5  processor,  whereas  the  analogous  EFP  calculation 
requires  only  0.4  seconds.  The  corresponding  CCSD(T)  calculation  would  be  much  more 
resource  demanding  than  MP2.  Taking  into  account  the  relatively  good  agreement  of  the 
EFP  method  results  with  the  CCSD(T)/aug-cc-pVQZ  results  described  above,  this 
significant  reduction  in  total  computation  time  comes  with  a  minimal  loss  of  accuracy. 
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3.  The  Systematic  Fragmentation  Method  (SFM) 

The  systematic  fragmentation  method  (SFM)  is  designed  to  permit  a  large 
molecular  system,  such  as  a  protein,  to  be  fragmented  into  smaller  pieces  in  such  a  way 
that  retains  the  accuracy  of  a  full  ab  initio  calculation  at  the  same  level  of  theory,  while 
significantly  decreasing  the  computational  expense.  By  treating  the  smaller  sub-systems 
with  accurate  levels  of  theory,  the  total  energy  and  properties  of  the  full  system  are 
obtained  through  addition  and  subtraction  of  the  contributions  from  the  overlapping  sub¬ 
systems  or  “groups.”  Many  body  effects  are  accounted  for  including  the  nearest  neighbor 
of  each  group.  Non-bonded  interactions  between  groups  are  also  accounted  for.  In  the 
original  formulation^"^  these  non-bonded  interactions  were  obtained  using  a  classical 
electrostatic  potential.  Recently,  this  non-bonded  description  has  been  improved  through 
the  use  of  the  EFP  method,  providing  a  more  accurate  representation  of  the  non-bonded 
interactions. 

Within  the  context  of  the  SFM,  a  molecule  can  be  thought  of  as  a  collection  of 
functional  groups.  For  example,  ethanol  contains  three  functional  groups  (CH3,  CH2,  and 
OH)  according  to  the  SFM  prescription.  To  fragment  the  system  into  functional  groups, 
single  bonds  are  broken.  This  process  splits  a  pair  of  bonding  electrons;  each  of  these 
electrons  is  assigned  to  one  of  the  two  resulting  fragments.  In  order  to  avoid  the  resulting 
radical  species,  a  hydrogen  atom  is  used  to  “cap”  the  dangling  bonds  that  are  created  by 
the  fragmentation.  The  capping  hydrogen  points  in  the  direction  of  the  broken  bond  at  a 
chemically  reasonable  distance.  By  design,  double  or  triple  bonds  are  not  broken, 
keeping  the  relevant  atoms  as  a  part  of  one  functional  group.  For  example,  ethanal  would 
contain  two  functional  groups  (CH3  and  CHO),  keeping  the  carbon  and  oxygen  atoms  of 
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the  carbonyl  together  in  one  group.  After  the  addition  of  the  hydrogen  caps,  the  ethanol 
groups  would  be  CH4,  CH4,  and  H2O,  and  the  ethanal  groups  would  be  CH4  and  CH2O. 

To  gain  a  more  quantitative  understanding  of  the  SFM,  consider  the  general 
example  of  a  linear  molecule  M  containing  K  functional  groups  G, : 

M  =  (4) 

Each  group  G.  is  connected  by  single  bonds  to  adjacent  groups  G._j  and  G.^j .  In  order  to 
separate  the  functional  groups  of  M  into  smaller  fragments,  one  can  imagine  breaking  the 
G._j  -  G.  single  bond,  then  capping  each  new  terminal  atom  with  a  hydrogen  atom.  This 
produces  two  new,  smaller  species. 


M,=G,G,G,...G,_,H,_, 

(5) 

(6) 

The  internal  geometries  of  and  are  preserved,  except  for  the  hydrogen  atoms  that 
have  been  used  to  cap  the  missing  bond  vector.  The  total  energy  can  then  be  written, 
without  approximation,  as: 

E{M)  =  E{M, )  +  E{M^ )  +  ,  (7) 

where  dE^  is  the  correction  for  the  differential  change  in  the  energy  caused  by  breaking  a 
bond  and  adding  two  hydrogen  caps.  This  process  can  be  repeated,  since  bonds  can  be 
broken  at  any  point  along  the  chain,  decomposing  the  full  system  into  many  smaller 
fragments.  As  the  separation  between  the  bond  breaks  is  increased,  the  accuracy  of  the 
SFM  will  increase,  since  the  larger  fragments  will  give  a  more  accurate  description  of  the 
full  system.  The  separation  between  broken  bonds  can  be  described  as  different  “levels” 
of  the  SFM. 

The  SFM  levels  are  defined  as  follows. 
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Consider  the  moleeule  M: 


M  =  (8) 

In  the  Level  1  SFM,  two  bonds  separated  by  just  one  funetional  group  are  sequentially 

broken.  The  fragments  initially  ereated  would,  for  example,  be  as  follows: 

M  »  Gfi^  +  -  Gj  (9) 

The  G2  fragment  is  subtraeted  off  to  conserve  the  number  of  atoms.  Subsequently,  this 

process  is  repeated  exhaustively  on  the  G2G3G4G5G6G7G8  fragment  until  no  fragment 

larger  than  2  functional  groups  remains.  In  the  end,  the  energy  of  M  can  be  approximately 

decomposed  into  the  simple  sum  of  fragment  energies  for  level  1  as  follows: 

^iTent^)  =  E{Gfi,)  +  E(G,G,)  +  E(G,G,)  +  E(G,G,)  +  E{G,G,)  +  E{Gfi,)  +  E{G,G,) 
-E(G,)  -E(G,)  -E(G,)  -E(G,)  -E(G,)  -E(G,) 

(10) 

Similarly  in  the  Level  2  SFM,  bonds  separated  by  two  functional  groups  are  sequentially 
broken  with  the  energy  of  M  being  decomposed  into  the  following  expression: 

=  E{Gfi,G,)  +  E{G,G,G,)  +  E{G,Gfi,)  +  E{G,G,G,)  +  E{G,Gfi,)  +  E(G,G,G,) 
-E{G,G,)  -E{G,G,)  -E{G,G,)  -E{G,G,)  -E{G,G,) 

(11) 

In  the  level  3  SFM,  bonds  separated  by  three  functional  groups  are  sequentially  broken 
with  the  energy  of  M  being  decomposed  into  the  following  expression: 

E“{M)  =  E{Gfififi,)  +  E(G,G,G,G, )  +  E(G,G,G,G,)  +  E(G,G,G,G,)  +  E(G,G,G,G,) 
-E(G,G,G,)  -E(G,G,G,)  -E(G,G,G,)  -E(G,G,G,) 

(12) 

It  is  important  to  note  that  in  the  limit  of  SFM,  that  is  for  level  n,  where  n  is  the  number 
of  groups  in  the  system,  one  would  be  left  with  the  un-fragmented  system.  So  the  higher 
the  SFM  level  employed,  the  larger  the  fragments  and  the  closer  one  should  get  to  the 
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energy  of  the  exaet  un-fragmented  system. 

There  are  some  limitations  of  the  SFM.  First,  as  noted  earlier,  the  SFM  is  unable  to 
fragment  eonjugation  in  deloealized  moleeular  systems.  The  seeond,  less  obvious, 
limitation  is  that  the  SFM  is  unable  to  fragment  six  member  rings  using  level  3  sinee  the 
eapping  hydrogens  would  approaeh  eaeh  other  too  closely  and  would  therefore  cause 
unphysical  repulsive  interactions.  To  avoid  this,  the  ring  must  remain  intact  and  is 
considered  to  be  a  functional  group  itself  Similarly,  five  member  rings  can  only  be 
fragmented  at  level  1;  four  and  three  member  rings  cannot  be  fragmented  at  all.  These 
exceptions  are  referred  to  as  the  ring  repair  rule. 

3.1  Non-bonded  interactions 

The  simplest  approach  to  obtain  the  energy  of  the  system  of  interest  would  be  to 
calculate  the  energies  of  the  individual  hydrogen-capped  fragments  and  sum  them 
accordingly.  The  result  obtained  from  this  procedure  would  differ  greatly  from  the 
analogous  calculation  on  the  full  molecular  system.  This  is  because  the  (non-bonded) 
interactions  among  the  separated  fragments  are  unaccounted  for.  These  non-bonded 
interactions  are  naturally  incorporated  into  the  full  ab  initio  calculation.  The  non-bonded 
interactions  are  modeled  within  the  SFM  framework  by  using  a  modified  many  body 

O'? 

expansion;  this  expansion  relies  on  the  assumption  that  bonded  interactions  are  much 
stronger  than  non-bonded  ones. 

3.2  Two-body  interactions 

The  interaction  energy  between  two  functional  groups  and  Gj  is  given  by 
]  =  E{Gfi,)-  E{G,)-  E(G,),  (13) 

where  ^’(GjGj)  is  the  super-molecular  energy  of  the  two  separated  functional  groups 
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(placed  in  their  positions  in  the  original  full  moleeule  M)  and  E{G^),E{G2)  are  the 
eorresponding  (one -body)  fragment  energies.  The  total  two-body  non-bonded  energy  of 
the  system  eontains  the  energies  of  all  possible  pairs  of  funetional  groups  that  are  not 
deseribed  by  the  fragmentation  of  the  bonded  system  in  the  definition  of  M.  That  is,  all 
pairs  of  groups  Gi,  G2  that  are  not  eontained  in  any  one  fragment. 

3.3  Three-body  interactions 

The  mutual  interaetion  of  three  funetional  groups  G^ ,  Gj  and  G^  is  assumed  to  be 
negligible  unless  any  two  of  the  groups  are  bonded  to  eaeh  other.  For  example,  if  G^  is 
bonded  directly  to  G^  then  the  three-body  interaetion  energy  would  be: 

£“•=>  [G,;Gj,G,]=  £(G,C,C0-  £(G,)-  £(C,G0 
-£i’'>[G,;Gj-£i’"[C,;G,] 

In  other  words,  the  three-body  energy  is  simply  the  super-moleeular  energy,  E{Gfi2Gi)  , 
minus  the  one-body  energies  E{Gy),E{G2G2)  and  minus  the  two-body  energies, 

-^nb  \G,-G2lE^^^  [Gj;G3]  .  The  total  three-body  energy  eonsists  of  all  eombinations 
eontaining  any  group  ( Gj )  with  any  two  bonded  functional  groups  ( Gj  or  G3 ),  so  long  as 
Gi  is  itself  not  present  in  any  bonded  fragment  with  Gj  or  G3 .  This  general  trend  ean  be 
extended  to  four  body  interaetions  and  beyond;  however  for  the  purposes  of  this  work 
only  three-body  terms  will  be  treated. 

The  total  SFM  energy  of  a  system  is  simply  the  addition  of  the  bonded  and  non- 
bonded  energies, 

77total  ^bonded  ,  ^non-bonded  /i 

^SFM=^  ’  (ij) 

where  ineludes  all  terms  up  to  order  from  the  modified  many-body 
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approximation.  For  example,  ealculations  employing  order  many  body  non-bonded 
energies  would  inelude  the  order  many  body  non-bonded  energies  as  well. 

3.4  SFMandEFP 

SFM  molecular  energy  calculations  corresponding  to  bonded  level  3  including 
many  body  non-bonded  interactions  apparently  provide,  on  average,  the  best  balance 
between  accuracy  and  computational  effort.  Although  the  non-bonded  approximation  is 
important  for  reliability,  it  also  hinders  computational  performance  by  significantly 
increasing  the  number  of  ab  initio  terms.  For  example,  moderately  sized  proteins  ('-'3500 
residues)  have  on  the  order  of  10^  non-bonded  interactions.  Because  there  are  so  many 
non-bonded  terms,  these  terms  can  dominate  the  calculation.  It  is  therefore  advantageous 
to  employ  approximate  methods  for  those  non-bonded  interactions  that  are  sufficiently 
distant  that  classical  approximations  might  be  valid.  The  simplest  approach,  using  just 
electrostatic  interactions,  were  used  in  the  original  SFM  implementation.  A  more 
sophisticated  approach,  using  effective  fragment  potentials  (EFP),  is  described  here. 
Compared  to  electrostatics,  intermediate  range  (2.7-4.5A)  EFP  interaction  energies  agree 
better  with  ab  initio  methods.  This  increases  the  number  of  non-bonded  terms  that  can  be 
calculated  with  model  potentials. 

The  determination  of  whether  a  non-bonded  term  is  treated  with  EFP  or  ab  initio 
methods  is  based  on  a  user-defined  cutoff  related  to  the  nearest  atom-atom  distance 
between  fragments.  The  short-range  (<  2.7A)  non-bonded  terms  use  ab  initio  methods, 
while  long-range  (>2.7A)  ones  use  EFP.  The  original  electrostatic  approach^^  used  a 
cutoff  of  4.5A.  This  shortened  EFP  cutoff  comparatively  reduces  the  number  of  ab  initio 
non-bonded  terms,  thereby  decreasing  the  computational  expense. 
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Previously,  Collins  and  Deev  tested  the  SFM  by  ealeulating  the  isomerization 
energies  of  a  set  of  organie  moleeules  (12-44  heavy  atoms)  obtained  from  the  Cambridge 
Struetural  Database. A  subset  of  this  set  of  isomerization  energies  is  examined  here. 
The  energies  that  are  obtained  by  employing  the  EFP/ah  initio  non-bonded  approaeh  are 
compared  with  both  the  fully  ab  initio  energies  (no  SFM)  and  to  the  SFM  in  which  all 
non-bonded  terms  are  calculated  with  the  same  ab  initio  method  that  is  used  for  the 
bonded  terms.  The  ab  initio  calculations  here  employ  both  the  Hartree-Fock  (HF)  and 
second  order  perturbation  theory  (MP2)  levels  with  the  6-31G(d,p)  basis  set.  Additional 
SFM  tests  are  presented  for  a  small  set  of  alpha  helices  using  the  6-31++G(d,p)  basis  set. 
The  larger  6-31 1++G(3df,2p)  basis  set  is  employed  for  creating  all  FFPs  used  for  non- 
bonded  interactions,  since  this  basis  set  has  been  shown  to  produce  reliable  results  and 
since  the  FFP  basis  set  dependence  does  not  significantly  affect  the  computational  cost 
relative  to  ab  initio  calculations.  All  of  the  SFM  calculations  presented  here  correspond 
to  bonding  level  3,  including  up  to  3'^‘*  order  many  body  non-bonded  interactions.  All 
calculations  are  performed  with  the  GAMFSS^"^  electronic  structure  code. 

Given  in  Table  2  are  the  errors  in  the  isomerization  energies.  The  corresponding 
structures  are  depicted  in  Scheme  1 .  It  is  evident  that  the  two  methods  for  treating  the 
SFM  non-bonded  energy  (FFP/ah  initio  and  ab  initio  only)  are  in  reasonable  agreement 
with  the  fully  ab  initio  (non  SFM)  energies,  as  the  mean  absolute  error  (MAF)  in  all 
cases  is  no  more  than  2.5  kcahmol.  Addition  of  the  3’^‘^  order  non-bonded  terms  does  not 
result  in  any  improvement  to  the  MAF.  Interestingly,  the  MAF  for  the  combined  FFP/ah 
intio  approach  for  the  non-bonded  terms  are  slightly  smaller  (-0.1 -0.5  kcal/mol)  than 
those  obtained  when  the  non-bonded  terms  are  evaluated  with  the  ab  initio  method  (HF 
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-5C 

or  MP2).  For  the  21  molecules  of  interest  here,  as  also  noted  by  Collins  and  Deev  ,  no 
improvement  in  the  net  CPU  time  is  observed  since  the  molecules  themselves  are  small. 
Improvements  in  CPU  timings  are  observed  for  larger  molecular  systems  (>100  atoms), 
as  discussed  below. 

SFM  isomer  energies  for  the  larger  model  alpha  helices  (ranging  from  125-170 
atoms)  are  shown  in  Table  3,  with  the  corresponding  structures  presented  in  Scheme  2. 
For  these  systems,  adding  the  higher  order  non-bonded  terms  does  improve  the  SFM 
performance.  The  MAE  improves  by  '-I  kcal/mol  when  the  order  non-bonded  terms 
are  included.  Here  again,  the  SFM  errors  obtained  when  using  the  EFP/ah  initio  non- 
bonded  energies  are  similar  (-'1  kcal/mol  smaller)  to  those  obtained  using  only  ab  inito 
non-bonded  terms.  Table  4  compares  the  CPU  times  for  using  the  EEP  method  for  non- 
bonded  terms  with  those  required  for  the  ab  initio-on\y  SEM.  The  time  needed  to 
generate  the  EEP  terms  is  also  listed.  This  time  becomes  significant  when  the  3’^‘^  order 
many  body  terms  are  included.  Eurther,  since  the  EEP  generation  requires  only 
calculations  at  the  Hartree-Eock  level  of  theory,  the  contribution  of  the  EEP  generation  to 
the  overall  computation  time  will  greatly  decrease  in  importance  when  more  accurate 
electronic  structure  methods  are  used. 

Nonetheless,  as  shown  in  Table  4,  employing  EEP  to  treat  a  portion  of  the  SFM 
3’^‘^  order  non-bonded  terms  results  in  an  overall  decrease  in  CPU  time  by  roughly  a  factor 
of  two.  Including  only  the  2^^  order  non-bonded  EFP/ah  initio  terms  gives  energies  in 
good  agreement  with  the  full  un-fragmented  energies  (Table  3:  MAE  =  2.6  kcal/mol),  but 
the  gain  in  computational  efficiency  is  small,  -'5-15%  less  CPU  time.  The  advantage  of 
using  the  EEP/ah  initio  approach  is  clearly  seen  in  Table  5,  where  the  number  of  non- 
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bonded  terms  that  must  be  eomputed  ab  initio  is  eompared  for  the  EFP/ah  initio  and  the 
eleetrostatie/ah  initio  methods.  Since  the  EFP  method  is  more  effective  at  capturing 
interaction  energies  than  electrostatics  at  close  range,  the  EFP  non-bonded  cutoff  can  be 
set  to  the  shorter  distance  of  2.1  K,  instead  of  4.5A.  This  shorter  cutoff  value  reduces  the 
number  of  expensive  ab  initio  non-bonded  terms  by  up  to  85-90%,  while  still  retaining 
good  accuracy.  This  increase  in  efficiency  will  be  especially  important  when  high  levels 
of  theory,  such  as  MP2  or  coupled  cluster  methods,  are  employed  to  treat  large  molecular 
systems.  A  major  advantage  of  the  SFM  (and  other  fragment-like  methods)  is  that  it 
enables  very  accurate  calculations  on  large  molecular  systems  that  would  otherwise  be 
impossible.  As  noted  above,  since  the  EFP  generation  requires  only  Hartree-Fock  level 
calculations,  the  contribution  of  the  EFP  generation  to  the  overall  computation  time  will 
greatly  decrease  in  importance  when  more  accurate  electronic  structure  methods  are  used. 
4.  The  Fragment  Molecular  Orbital  (FMO)  Method 

The  FMO  method^^'"^^  relies  upon  the  assumption  that  electron  exchange  and 
charge  transfer  are  largely  local  phenomena  in  chemical  systems.  By  breaking  a  system 
into  fragments  and  treating  the  long-range  interactions  in  a  system  using  only  a  Coulomb 
operator,  there  are  significant  reductions  in  computational  expense.  In  addition  to  this 
initial  reduction  in  computational  cost,  the  FMO  method  is  further  enhanced  with  the 
generalized  distributed  data  interface  (GDDI)."^^  The  GDDI  uses  a  two-level 
parallelization  scheme,  assigning  individual  fragment  calculations  to  different  groups, 
each  group  performing  its  fragment  calculation  in  parallel.  The  FMO  method  has  also 
been  interfaced  with  the  polarizable  continuum  model  (PCM)^^  and  the  effective 
fragment  potential  (EFP)  for  the  inclusion  of  solvent  effects.  There  is  also  a  multi-layer 
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fn 

FMO  (MFMO)  implementation  that  allows  for  the  use  of  different  wavefunetion  types 
for  different  fragments.  The  combination  of  the  long-range  approximations  to  the  system 
and  the  GDDI  parallelization  helps  to  facilitate  the  treatment  of  large  molecular 

.  45,68 

systems. 

Creating  fragments  in  the  FMO  method  involves  breaking  bonds  electrostatically, 
assigning  two  electrons  of  a  covalent  bond  to  one  fragment  and  none  to  the  other,  with 
the  fragment  choice  relying  on  the  chemical  intuition  of  the  user.  To  avoid  the  charged 
species  created  by  such  a  fragmentation  scheme,  a  proton  from  the  electron  deficient 
fragment  is  reassigned  to  the  electron  rich  species,  creating  two  neutral  fragments 
(indicated  by  the  “1”  and  “5”  in  Figure  4).  The  “1”  and  “5”  in  the  figure  both  carry  sp^ 
hybrid  orbitals,  to  maintain  the  carbon  character.  The  individual  fragment  (monomer) 
calculations  are  performed  in  the  presence  of  a  Coulomb  “bath”  that  represents  the 
electrostatic  potential  (ESP)  of  the  system  (Figure  5).  Significant  improvements^^’^*^  to 
this  description  of  the  FMO  method  have  been  obtained  by  including  many-body  effects. 
Two-body  (dimers;  called  FM02)  and  three-body  (trimers;  called  FM03)  interactions  of 
the  monomers  are  fully  quantum,  therefore  all  interactions  are  included  in  each 
calculation. 

To  calculate  the  energy  of  a  system  within  the  FMO  method,  first  the  initial 
electron  density  distribution  is  calculated  for  each  monomer  in  the  Coulomb  bath  of  the 
system.  The  monomer  Fock  operators  are  constructed  and  the  energy  of  each  monomer 
is  calculated.  Each  of  the  monomer  energies  is  iterated  to  self-consistency  in  this 
manner,  leading  to  the  convergence  of  the  ESP. 
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The  total  energy  of  a  ehemical  system,  within  the  FMO  approximation,  can  be  written  as; 


I  1>J 


+  i  {iEu,-E,-E,-E,)-{E^-E,-E,)  (10) 

I>J>K 

~  iEjic  ~  Ej  —  Ef.)  —  {Egj  —  Ef.  —  -Ej)}  + ... 

where  monomer  (7),  dimer  (77)  and  trimer  {UK)  energies  are  obtained  using  the  standard 
SCF  method.  Despite  the  seeming  simplicity  of  Eq.  (10),  the  FMO  method  encapsulates 
the  deeper  ideas  of  properly  handing  many-body  effects,  as  clarified  in  the  diagrammatic 
treatment^^  and  the  Green’s  function  formalism. This  is  a  very  important  distinction 
between  the  FMO  and  other  methods.  The  Fock  equation; 

x  =  lJJJJK  (11) 

r=H"+G"  (12) 


is  modified  from  the  standard  form  with  the  addition  of  a  term,  that  represents  the 


ESP  to  the  one-electron  Hamiltonian  H’' . 

«;. = w;. + + sZ(Ak;)(«>‘ F> 

i 


(13) 


The  modified  Hamiltonian  also  contains  the  projection  operator, 

i 

needed  for  division  of  basis  functions  along  the  fractioned  bonds,  where  B  is  a  constant 
chosen  to  be  sufficiently  large  to  remove  the  corresponding  orbitals  out  of  variational 
space  (normally  5=10^  a.u.). 

The  ESP  of  the  system  takes  the  form; 

E  (<.+<,.)  (14) 

K(^x) 
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(15) 


v=  Dk) 

(16) 

ActeK 

where  the  first  term  is  the  nuelear  attraction  contribution  and  the  second  term  is 

the  two-electron  contribution,  both  of  which  are  calculated  for  each  of  the  surrounding 
monomers  K  with  electron  density  D^. 

4.1  FMO  Approximations 

The  formulation  of  the  energy  described  above  has  limitations"^"^’^*^,  such  as  the 
increasing  cost  of  the  two-electron  term  in  the  ESP.  To  reduce  this  cost,  different 
approximations  can  be  used  to  treat  the  ESP  by  creating  a  cut-off  value  Rapp.  Outside  this 
cut-off  the  two-electron  terms  of  the  ESP  can  be  treated  in  a  more  approximate  way. 
However,  the  foregoing  energy  formulation  loses  some  accuracy  with  such 
approximations,  because  the  balance  among  the  approximations  in  different  EMO  terms 
may  be  lost.  Eor  example,  if  there  are  three  monomers  7,  J  and  L  with  some  distance 
based  approximation  {Rapp)  applied,  and  the  relative  distances  are  as  illustrated  in  Eigure 
6,  then  the  electrostatic  interaction  of  monomers  1  and  L  would  be  treated  using  the 
approximation,  while  the  interaction  of  monomers  J  and  L  would  be  treated  with  the  full 
ESP.  However,  there  would  be  an  interaction  of  dimer  IJ  with  monomer  L  without  any 
approximations,  (because  L  is  close  to  IJ  and  J,  but  far  from  7).  This  causes  a  loss  of 
balance  among  some  of  the  dimer  energy  terms  in  the  expression 

N 

(17) 

I>J 
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for  those  dimers  IJ  in  which  some  ESP  contributions  (i.e.,  those  for  fragment  L)  included 
in  E[  are  treated  using  the  approximation,  but  in  others  they  are  not.  There  can  be  a  great 
many  dimer  contributions  to  the  energy  in  a  single  calculation,  causing  significant  loss  of 
accuracy  in  the  energy  of  the  full  system. 

The  issue  described  above  requires  a  reformulation  of  the  energy  that  is 
equivalent  to  Eq.  (10),  but  more  accurate  if  approximations  to  the  ESP  are  used^"^.  Eor 


PM02, 


£»,,  =  -e;  -  -B;  )  +  Z  {J>'(D"V")  -  Tr(B‘V )  -  (D"V^ )} 
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A  similar  expression  has  been  derived  for  PM03. 

The  new  energy  terms  E'^  are  defined  as  the  internal  energies  of  the  monomers  and 
dimers  with  the  ESP  contributions  subtracted  out; 

E[=E^-  TKD'V")  X  =  I,JJJ  (19) 

This  is  accomplished  by  contracting  with  the  electron  density  AD’^  is  the 
difference  density  matrix,  defined  as 


AD"  =  D"-D'  ©D^  = 


d''  d" 

V  d-"  d"  y 


d^  0 
0  0 


0  0 
0  d^ 


(20) 


where  d^^,  d^'^,  d'^^  and  d'^'^  are  blocks  of  D",  and  d^  (d'^)  is  simply  equal  to  (D'^).  This 
formulation  makes  it  possible  to  calculate  the  total  energy  explicitly  from  only  the  dimer 
ESP  By  subtracting  the  monomer  and  dimer  ESPs  in  the  energy  expression, 

approximations  can  be  applied  to  the  monomers  and  dimers  separately.  The  dimer  ESP 
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then  directly  contributes  to  the  total  energy,  while  the  monomer  ESP  determines  the 
monomer  electron  densities,  only  contributing  to  the  total  energy  indirectly. 

Two  different  levels  of  approximation  are  currently  used  in  the  FMO  method, 
enabled  by  equation  (18).  For  intermediate  distances  the  Mulliken  approximation  to  the 
two-electron  integrals  is  used.  Equation  (16)  can  then  be  rewritten  as; 

(21) 

This  approximation  reduces  the  computational  cost  of  the  two-electron  integrals  by  a 
factor  of  Nb  (number  of  basis  functions). 

The  fractional  point  charge  approximation,  using  the  Mulliken  atomic  populations 
of  the  monomers,  is  used  for  long  distances.  The  two-electron  term  of  equation  (16)  is 
then  simplified  as; 

<,-Z(-“i(a(i>-->-Ai)i>')  (22) 

reducing  the  computational  cost  of  the  two-electron  integrals  by  another  factor  of  Nb- 

Inter-fragment  interactions  have  a  similar  approximation  that  evaluates  the 
electrostatic  contribution  to  the  energy  using  the  monomer  densities  for  far  separated 
dimers,  instead  of  calculating  the  dimer  density  itself  This  contribution  is  added  to  the 
dimer  energy  as 

i-;,  =  i-;  +  £;  +  rKD'«‘'<-'')  +  rr(Z)"«‘''''>)+y  yz);z);,(^,-|/xT)  (23) 

juv^I  pa^J 

where  are  one-electron  Coulomb  potentials  of  the  force  exerted  by 

fragment  J  on  fragment  I,  and  fragment  I  on  fragment  J,  respectively. 

Other  approximations  of  the  same  nature  are  implemented  for  correlated  dimers 
and  trimers,  where  the  corresponding  corrections  for  far  separated  pairs  and  triples  of 
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fragments  are  negleeted/^’^"^.  Formal  definitions  and  deseriptions  of  the  trimer 
interactions  and  cut-offs  used  in  FM03  have  been  described  previously^^’^^  and  will  not 
be  discussed  here.  All  of  these  approximations  are  based  on  a  distance  definition  Rapp, 
defined  as  the  minimum  distance  between  atoms  in  n-mer  1  and  monomer  J  divided  by 
the  sum  of  their  van  der  Waals  radii. 

There  have  been  several  new  developments  in  the  FMO  theory  that  cannot  be 
discussed  in  detail  here.  Nonetheless,  it  is  useful  to  mention  a  few  of  them  briefly.  As  an 
alternative  to  the  original  bond  fragmentation  scheme,  in  which  the  electron  density 
describing  the  detached  bonds  is  variationally  optimized,  a  new  scheme  has  been 
suggested  in  which  this  density  is  obtained  for  a  model  system  and  is  kept  frozen  in 
fragment  calculations  .  This  new  scheme  has  been  shown  to  work  well  for  covalent 
crystals  such  as  zeolites.  The  FMO  method  has  also  recently  been  implemented  for  the 
study  of  excited  states  ,  using  multi-configurational  self-consistent  field  (MCSCF) 
theory,  configuration  interaction  (Cl),  and  time-dependent  density  functional  theory 
(TDDFT). 

4.2  FM02  and  FM03  Calculations  on  (H20)32  Clusters 

The  unusual  characteristics  of  liquid  water  make  it  both  very  important  to 
chemical  processes  and  particularly  difficult  to  model  accurately.  The  structure  of  small 
(H20)n  (n=6  through  20)  clusters  have  recently  been  determined  using  coupled  cluster 
theory;  however,  the  ability  to  model  water  clusters  larger  than  this  at  the  same  level  of 
theory  is  nearly  impossible.  The  FMO  method  provides  a  way  to  model  much  larger 
water  clusters  at  high  levels  of  theory  such  as  CCSD(T),  while  keeping  the  computational 
cost  manageable. 
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In  the  present  work,  calculations  of  the  energies  of  (H20)32  water  clusters  are 
reported,  using  fully  ab  initio  Moller-Plesset  second  order  perturbation  theory  (MP2)  as 
well  as  the  MP2  implementation  of  the  FMO  method^^’^^.  For  these  clusters  a  fragment 
(monomer)  is  defined  as  one  water  molecule  for  both  FM02  and  FM03  calculations. 
Initial  structures  were  obtained  from  EFP  Monte  Carlo/Simulated  Annealing  (MC/SA) 
simulations  followed  by  EFP  optimizations  of  a  representative  set  of  structures.  The 
MC/SA  method  with  local  minimization  was  used  to  sample  the  configuration  space.  For 
each  global  minimum  found,  the  number  of  structures  sampled  was  on  the  order  of 
500,000  -  1,100,000.  The  number  of  steps  taken  for  each  temperature  was  varied  (100, 
500,  1000,  10,000),  along  with  changing  the  number  of  steps  between  local 
minimizations  (10,  100,  1,000).  The  number  of  fragments  moved  per  step  was  also  varied 
between  one  and  five.  The  starting  temperature  for  the  simulated  annealing  varied  from 
500  to  20,000  K  and  the  final  temperature  was  kept  at  300  K.  This  selection  of  isomers 
(the  lowest  energy  structure  is  shown  in  Figure  7)  was  used  to  investigate  the  accuracy  of 
the  FMO  method  by  comparing  both  absolute  and  relative  energies. 

Average  errors  for  the  FM02-MP2  calculations  (Table  6)  using  the  6-31++G(d,p) 
basis  set  are  very  consistent,  around  12  kcal/mol.  The  FM03-MP2  results  illustrate  the 
importance  of  three-body  interactions  in  water  clusters,^^’^^  again  with  very  consistent 
errors  of  ~2-3  kcal/mol  (Table  6).  Comparing  results  between  basis  sets  in  Table  6,  when 
the  basis  set  size  is  increased  to  6-311++G(3df,2p),  the  FM02  errors  double  to  ~24-28 
kcal/mol,  while  the  FM03  errors  are  cut  in  half  to  '-I  kcal/mol.  This  increase  in  errors 
with  an  increased  basis  set  size  for  the  two-body  FMO  method  could  be  due  to  an 
increased  importance  of  three-body  contributions  when  the  better  basis  set  is  used.  The 
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larger  basis  set  also  provides  a  better  description  of  three-body  interactions,  making  the 
lack  of  these  interactions  in  FM02  even  more  detrimental. 

Despite  the  large  absolute  errors  present  in  the  FM02  description  of  water 
clusters,  the  relative  energetics  of  the  different  isomers  is  captured  quite  well.  On 
average,  the  FM02  relative  energies  are  in  agreement  with  full  ab  initio  results  to  within 
-'1-2  kcal/mol  with  both  basis  sets,  shown  in  Table  7.  The  error  increases  for  FM02  as 
the  relative  energy  of  the  isomers  increases,  showing  an  increased  importance  of  three 
body  contributions  with  higher  energy  isomers.  For  both  basis  sets,  the  FM03  results  are 
within  -O.S  kcal/mol  or  less  for  all  isomers  as  shown  in  Table  7,  effectively  removing  the 
error  from  the  two  body  description  used  in  FM02. 

4.3  The  FMO  Method  Applied  to  Ionic  Liquids 

80  8 1  82 

Previous  studies  of  ionic  liquids  ’  ’  have  focused  on  the  decomposition  of  ion 
pairs  (Figure  8),  providing  insight  into  the  chemistry  of  their  ignition  as  high  energy 
fuels.  The  focus  of  this  paper,  however,  will  be  to  accurately  describe  larger  systems 

0-5 

beyond  single  anion-cation  pairs.  Recent  work  by  Li  et.  al.  has  provided  an  accurate 
structure  of  two  ion  pairs  (two  cations  and  two  anions),  providing  a  greater  understanding 
of  the  molecular  structure  and  intermolecular  interactions.  The  same  system  will  be 
modeled  here,  along  with  systems  of  three  ion  pairs  to  illustrate  the  effectiveness  of  the 
FMO  method  in  accurately  describing  complex  molecular  clusters,  with  the  goal  of 
modeling  much  larger  systems  in  the  future. 

Two  ionic  liquid  systems,  l-H,4-H-l,2,4-triazolium  dinitramide  and  1 -amino,  4- 
H-l,2,4-triazohum  dinitramide  (Figure  8),  were  studied  using  both  ab  initio  Moller- 
Plesset  second  order  perturbation  theory  (MP2)  and  the  MP2  implementation  of  the  FMO 
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method^^’^^  with  one  ion  chosen  as  an  FMO  fragment  or  monomer.  Structures  composed 
of  two  cations  and  two  anions  (tetramers),  shown  in  Figures  9  and  11,  and  larger  clusters 
of  three  cations  and  three  anions  (hexamers),  shown  in  Figures  10  and  12,  were  optimized 
at  the  MP2/6-31+G(d)  level  of  theory.  FM02-MP2  and  FM03-MP2  single  point  energy 
calculations  were  then  performed  for  comparison  with  the  fully  ab  initio  results.  Mulliken 
charges  on  each  cation  and  anion  were  also  compared  to  ensure  that  the  pronounced 
charge  separation  present  in  ionic  liquids  ’  ’  is  captured  using  the  FMO  method. 

Comparing  the  energies  from  FM02  and  FM03,  it  can  be  seen  immediately  that 
the  FMO  method  captures  the  total  energy  very  well,  within  2  kcal/mol  in  the  worst  case 
(Table  8).  For  the  tetramers,  both  FM02  and  FM03  are  in  good  agreement;  the  FM02 
errors  are  less  than  1  kcal/mol  relative  to  the  fully  ab  initio  results.  For  the  hexamers,  the 
FM02  errors  are  less  than  2  kcal/mol,  illustrating  that  FM03  is  not  required  to  achieve 
the  desired  level  of  accuracy  for  these  particular  ionic  liquid  systems.  Whether  this  trend 
persists  as  system  size  grows  beyond  three  ion  pairs,  or  for  other  ion  pairs,  must  be  tested 
further. 

80  8 1  82 

As  shown  in  previous  studies  ’  ’  ,  ionic  liquid  ion  pairs  have  a  definite 
separation  of  charge  (approximately  +1  on  the  cations,  -1  on  the  anions)  at  equilibrium 
geometries.  This  charge  separation  is  also  observed  for  tetramers,  as  shown  in  Table  9, 
and  the  charge  separation  between  cations  and  anions  is  still  present  up  to  hexamer 
structures.  FM02  captures  the  qualitative  charge  separation  quite  well,  however,  the 
magnitude  of  charge  present  on  both  cations  and  anions  is  slightly  overestimated  by 
FM02  for  both  tetramer  structures  (Table  9).  However  as  the  system  size  increases  to 
three  ion  pairs,  the  difference  between  FM02,  FM03  and  ab  initio  results  becomes 
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minimal.  Future  work  using  larger  basis  sets  will  help  determine  if  FM02  is  accurate 
enough  to  describe  larger  ionic  liquid  clusters  or  if  FM03  will  be  required. 

Another  consideration  for  larger  molecular  systems  is  the  computer  time  required. 
To  illustrate  the  overall  effectiveness  of  the  FMO  method  in  both  providing  accurate 
results  and  reducing  computational  requirements,  timings  were  performed  for  the  ionic 
liquid  systems  described  above.  Due  to  the  fact  that  FM02  is  in  good  agreement  with  the 
ab  initio  results,  only  timings  for  FM02  will  be  shown.  However,  it  is  noted  here  that 
because  the  tetramers  and  hexamers  examined  here  are  small,  the  FM03  timings  for  these 
systems  do  no  exhibit  any  time  savings  relative  to  the  full  MP2  calculations.  The  benefit 
of  using  FM03  is  only  seen  with  larger  systems  . 

Timings  were  performed  on  a  Cray  XT4  supercomputer  using  AMD  Opeteron64 
processors  running  at  2.1  GHz,  located  at  the  U.S.  Army  Engineer  Research  and 
Development  Center  (ERDC)  in  Vicksburg,  Mississippi.  Single  point  Moller-Plesset 
second  order  perturbation  theory  (MP2)  energy  calculations  were  performed  using  8,  16 
and  32  processors  with  both  PM02  and  MP2  using  the  6-31+G(d)  basis  set.  As  shown  in 
Table  10,  PM02  requires  approximately  half  the  computer  time  of  a  full  MP2  calculation 
on  the  tetramers.  With  the  increase  in  available  processors,  the  overall  time  requirements 
are  cut  in  half  for  both  PM02  and  MP2,  showing  good  scalability  for  both  methods. 
With  an  increase  in  system  size  from  ionic  liquid  tetramers  to  hexamers,  the  computer 
time  required  for  a  fully  ab  initio  calculation  increases  more  than  6  fold,  while  the  PM02 
requirement  only  doubles.  So,  the  PM02  cost  savings  relative  to  full  MP2  is  much 
greater  than  that  observed  for  the  tetramers.  Again,  scalability  for  both  methods  is  very 
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good  for  the  hexamers,  cutting  the  computational  time  in  half  when  doubling  the  number 
of  available  CPUs. 

It  is  apparent  that  as  the  system  size  increases  to  larger  ionic  liquid  clusters,  or  as 
the  basis  set  size  increases  (or  both)  the  computational  requirements  for  a  fully  ab  initio 
calculation  will  rapidly  and  increasingly  exceed  those  for  FM02.  It  may  be  that  as  the 
system  size  increases,  the  importance  of  three-body  contributions  to  the  interaction 
energy  will  also  increase,  requiring  the  use  of  FM03.  Future  work  will  determine  the 
importance  of  three-body  terms  in  ionic  liquid  systems,  as  well  as  the  ability  of  the  FMO 
method  to  describe  larger  molecular  clusters. 

5.  Summary  and  Conclusions 

Obtaining  chemical  accuracy  (1  kcahmol)  using  model  chemistries  has  been  a 
major  focus  of  quantum  chemistry  research  for  the  last  quarter  of  a  century.  The  desire  to 
study  larger  systems  in  order  to  capture  novel  chemical  phenomena  (e.g.  solvent  effects, 
surface  science,  enzyme  and  heterogeneous  catalysis  and  polymerization  phenomena), 
including  the  kinetics  and  dynamics  of  such  processes,  often  requires  very  accurate 
predictions  of  potential  energy  surfaces  for  subsequent  predictions  to  be  even 
qualitatively  correct.  The  computational  effort  of  traditional  methods  such  as  Hartree- 
Fock  (HF),  density  functional  theory  (DFT),  order  perturbation  theory  (MP2),  and 
coupled  cluster  theory  with  perturbative  triples  (CCSD(T))  scale  as  0(n"^),  0(n"^),  O(n^), 
and  O(n’),  respectively,  where  n  represents  the  size  of  the  system,  e.g.,  the  size  of  the 
basis  set.  In  practice,  this  limits  the  sizes  of  systems  that  can  be  studied  with  HF/DFT, 
MP2  and  CCSD(T)  to  approximately  a  few  hundred,  one  hundred,  and  twenty  non¬ 
hydrogen  atoms,  respectively.  By  developing  highly  parallel  algorithms,  the  goal  of  using 
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sophisticated  electronic  structure  methods  to  investigate  large  moleeular  problems 
becomes  more  feasible,  espeeially  if  one  has  access  to  massively  parallel  eomputing 
hardware.  However,  scalability  beyond  hundreds  to  a  few  thousand  proeessors  is 
generally  a  serious  bottleneek  for  correlated  electronic  structure  methods.  Consequently, 
parallel  computing  is  not  the  sole  solution  to  enabling  aceurate  caleulations  on  extended 
molecular  systems;  other  approaehes  are  needed.  If  one  is  interested  in  performing  long¬ 
time  simulations  at  reliable  levels  of  theory,  the  situation  is  only  exaeerbated. 

Pioneering  work  by  Warshel^**  and  others^  introduced  hybrid  methods  that  employ 
both  quantum  meehanics  (QM)  and  molecular  mechanics  (MM),  leading  to  the  now 
ubiquitous  QM/MM  approaeh.  Importantly,  the  QM/MM  approach  is  quite  general,  so  it 
can  be  employed  with  any  level  of  QM,  ineluding  the  fragmentation  methods  that  have 
been  the  primary  focus  of  the  present  work.  Modern  fragmentation  methods  have  their 
roots  in  ideas  from  Murrel  (1955)"^^'^  and  Christoffersen  (1971)."^^’’  More  reeently 
developed  fragmentation  methods,  sueh  as  the  fragment  moleeular  orbital  (FMO)  method 
and  the  systematie  fragmentation  method  (SFM),  are  now  beeoming  capable  of  achieving 
ehemical  accuracy  for  extended  moleeular  systems. 

The  effective  fragment  potential  (EFP)  method  has  been  developed  to  model 
non-bonded,  intermolecular  interactions.  There  are  two  related  implementation  of  the 
EFP  method:  The  original  method,  called  EFPl,  was  developed  specifically  to  study 
aqueous  solvent  effeets  on  chemical  processes.  The  more  recently  developed  EFP2 
method  is  completely  general,  in  the  sense  that  an  EFP2  is  generated  automatically  by  a 
simple  GAMESS  run,  so  it  contains  no  empirically  fitted  parameters.  The  Coulomb  and 
induction  terms  are  common  to  EEPl  and  EEP2,  but  the  remaining  terms  in  EEP2  are 
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derived  from  first  prineiples.  Onee  an  EFP2  has  been  built  for  a  speeifie  system,  the 

evaluation  of  EFP-EFP  interactions  requires  a  small  fraction  of  the  computational  cost 

compared  to  the  fully  QM  calculation.  The  EFP  computational  effort  scales  in  the  range 

of  quadratic  to  linear  with  an  increasing  number  of  fragments.  EFP1/MP2  achieves  an 

accuracy  of  '-I  kcal/mol  for  the  relative  energies  of  six-water  clusters  compared  to 

CCSD(T)/aug-cc-pVTZ.'^’'’  For  benzene  dimer  binding  energies,  EFP2  achieves  an 

accuracy  of  '-I  kcal/mol  relative  to  CCSD(T)/aug-cc-pVTZ  results.  The  EFPl  method 

has  been  interfaced  with  the  QM  methods  HF,'^  DFT,^"^  MCSCF,'^  singly  excited 

configuration  interaction  (CIS),^^  and  time-dependent  density  functional  theory 

(TDDFT)^’  within  the  GAMESS  suite,  so  EFPl  is  a  fully  QM/MM  method.  The  EFP2- 

1 8 

QM  integration  is  currently  under  development. 

The  SFM  fragments  a  molecule  based  on  the  number  of  single  bonds  in  each 
fragment,  while  considering  the  environmental  effects  of  distant  parts  of  the  system  via  a 
many  body  expansion  of  the  interactions  not  captured  by  the  internal  energies  of  the 
fragments.  This  framework  allows  the  SFM  to  be  widely  applicable  with  a  simple  user 
interface,  which  has  been  integrated  into  the  GAMESS  suite.  The  SFM  has  been  used  to 

-5C  -5'7 

study  small  and  medium  sized  organic  molecules  ,  as  well  as  crystals.  In  this  paper,  it 
was  demonstrated  that  the  SFM,  when  using  EFPs  for  the  non-bonded  interactions,  has  a 
mean  averaged  error  of  1.8  kcal/mol  for  several  a-helical  isomers  at  the  HF/6- 
31++G(d,p)  level  of  theory.  The  SFM  is  independent  of  the  ab  initio  methods  used  in 
calculations  of  the  fragments,  thereby  facilitating  highly  accurate  energies  and  relative 
energies  with  nearly  linear  scaling  as  the  size  of  the  system  is  increased.  The  time 
requirements  for  the  EFP  part  of  a  SFM  calculation,  when  EFPs  are  used  for  the  non- 
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bonded  interactions,  are  determined  by  the  cost  of  an  initial  HF  single  point  calculation. 
Therefore,  the  EFP  fraction  of  the  overall  computer  time  requirements  decreases  rapidly 
as  the  level  of  ab  initio  theory  increases  (e.g.,  from  HF  to  MP2  to  CCSD(T)). 

The  FMO  method  treats  each  fragment  (monomer,  dimer,  etc.)  in  a  Coulomb  bath 
that  represents  the  remainder  of  the  full  system.  The  energy  of  each  monomer  is  iterated 
to  self-consistency  within  this  Coulomb  bath.  The  FMO  method  is  very  flexible  with 
regard  to  the  definition  of  fragments  (i.e.,  monomers),  the  assignments  of  distance  cutoff 
parameters,  and  the  level  of  many-body  effects  (i.e.,  dimer,  trimer,  etc.)  to  be  included  in 
the  calculation.  Combined  with  the  avoidance  of  capping  procedures,  this  facilitates  the 
study  of  a  wide  variety  of  systems  including  clusters,  zeolites,  and  proteins,  and  the 
ability  to  balance  accuracy  and  computational  efficiency.  Within  the  GAMESS  suite,  the 
EMO  method  has  been  interfaced  with  the  polarizable  continuum  method  and  the  EEP 
method  for  studies  of  solvent  effects  on  chemical  processes.  Each  monomer  in  a 
molecular  system  of  interest  can  be  treated  by  most  traditional  electronic  structure 
methods.  In  the  present  work,  the  EMO  method  has  been  shown  to  achieve  accuracy 
within  1  kcahmol  for  both  ionic  liquid  systems  and  water  clusters. 

The  EFP  method  provides  a  systematically  improvable  description  of  non-bonded 
interactions,  while  the  FMO  method  and  the  SFM  facilitate  the  description  of  large 
molecular  systems  with  high  levels  of  accuracy.  The  interface  of  the  two  fragmentation 
methods  for  internal  and  near-field  ab  initio  calculations  with  the  EFP  method  for  non- 
bonded  moderate  and  far-field  interactions  and  for  solvent  effects  provides  a  powerful 
and  computationally  effective  combination.  Additionally,  the  ability  of  these  methods  to 
take  advantage  of  the  standard  theoretical  electronic  structure  framework  allows  their 
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capabilities  to  move  forward  with  new  advanees  in  eleetron  correlation,  wavefunction 
description  and  basis  set  development. 

An  important  advantage  of  the  FMO  and  SFM  approaches  deseribed  here  is  their 
ability  to  take  great  advantage  of  massively  parallel  computers.  Beeause  the  energy  of 
eaeh  fragment  can  be  computed  essentially  independently,  eaeh  fragment  calculation  can 
be  determined  on  a  separate  eompute  node.  Further,  beeause  most  of  the  algorithms  used 
in  GAMES  S  for  eleetronic  structure  functionalities  are  themselves  highly  scalable,  the 
fragment-based  ealculations  ean  take  advantage  of  multi-level  parallelism.  This 
eapability,  whieh  is  enhanced  by  middleware  developments  like  the  generalized 
distributed  data  interfaee  (GDDI),  bodes  well  for  the  implementation  of  algorithms  for 
“petascale”  computers  that  are  expected  to  eome  on  line  within  the  next  2-3  years. 
Simultaneous  advancements  in  new  approaches  like  the  fragmentation  methods  discussed 
here,  novel  parallel  algorithms,  ab  initio  theory,  and  novel  approaehes  in  hardware 
development  are  all  required  if  one  is  to  successfully  address  the  grand  ehallenge 
problems  in  the  ehemical  sciences,  biological  sciences  and  materials  scienee  and 
engineering. 
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Table  1.  Binding  energies  (keal/mol)  and  equilibrium  separations  R  (A)  of  benzene 
dimer  struetures. 


Method 

Basis  set 

Sandwieh 

T-Shaped 

Parallel-displaced 

R 

Energy 

R 

Energy 

Ri 

R2 

Energy 

MP2" 

aug-cc-pVDZ*’ 

3.8 

-2.83 

5.0 

-3.00 

3.4 

1.6 

-4.12 

aug-ce-pVTZ 

3.7 

-3.25 

4.9 

-3.44 

3.4 

1.6 

-4.65 

aug-cc-pVQZ’’ 

3.7 

-3.35 

4.9 

-3.48 

3.4 

1.6 

-4.73 

CCSD(T)" 

aug-cc-pVDZ*’ 

4.0 

-1.33 

5.1 

-2.24 

3.6 

1.8 

-2.22 

aug-cc-pVQZ*’ 

3.9 

-1.70 

5.0 

-2.61 

3.6 

1.6 

-2.63 

EFP 

6-311++G(3df,2p) 

4.0 

-2.11 

5.2 

-2.50 

3.8 

1.2 

-2.34 

a  Reference  56 

b  Basis  sets  as  described  in  Ref.  56 
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Table  2.  Mean  absolute  errors  of  isomerization  energies  (keal/mol)  ealeulated  by  SFM, 
relative  to  fully  ab  initio  (no  SFM)  energies.  The  non-bonded  terms  use  the  eombined 
EFV/ab  initio  approximation  (cutoff  >2. VA).  Given  in  parentheses  is  SFM  with  the  non- 
bonded  term  fully  ab  initio  (no  approximation). 

2"'^  Order  Many  Body  3rd  Order  Many  Body 

_ 6-31G(d,p) _ 6-31G(d,p) 


Isomer 

HF  kcal/mol 

MP2  kcal/mol 

HF  kcal/mol 

MP2  kcal/mol 

ODETAS-AHALUQ 

0.0 

(0.5) 

0.6 

(0.1) 

0.6 

(0.2) 

0.4 

(0.3) 

ODETASOl-AHALUQ 

0.5 

(1.2) 

0.0 

(0.7) 

0.9 

(0.4) 

0.0 

(0.1) 

BAZGEP-BAZGIT 

0.4 

(0.6) 

0.4 

(0.2) 

0.2 

(0.5) 

0.3 

(0.3) 

BELDIF-NOTGAE 

2.6 

(2.6) 

4.6 

(5.1) 

2.6 

(2.4) 

4.6 

(4.9) 

FDOURDOl-BOFWIC 

0.5 

(0.7) 

0.2 

(0.9) 

0.1 

(0.2) 

0.5 

(1.6) 

CONBAI-FDMUPDIO 

0.7 

(0.3) 

1.1 

(2.4) 

3.5 

(4.2) 

2.9 

(5.4) 

IDUFES-IDUFAO 

1.6 

(2.1) 

3.1 

(5.3) 

0.7 

(0.2) 

1.8 

(1.1) 

LEDRAN-LEDRER 

1.0 

(0.8) 

1.0 

(2.6) 

1.7 

(1.9) 

1.6 

(2.8) 

LEDRIV-LEDRER 

1.9 

(1.5) 

0.3 

(0.0) 

0.5 

(0.4) 

1.4 

(1.0) 

TAXYIA-MOGQOO 

1.3 

(0.4) 

11.1 

(8.4) 

0.0 

(1.2) 

9.7 

(6.8) 

TAXYOG-MOGQOO 

1.6 

(2.1) 

1.1 

(3.7) 

2.0 

(2.6) 

1.5 

(4.2) 

WINXIA-XEXXIH 

0.3 

(0.3) 

0.1 

(0.1) 

0.3 

(0.3) 

0.1 

(0.1) 

MAE 

1.0 

(1.1) 

2.0 

(2.5) 

1.1 

(1.2) 

2.1 

(2.4) 
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Table  3.  Absolute  errors  in  isomerization  energies  (kcal/mol)  at  HF/6-31++G(d,p)  for 
alpha  helixes,  relative  to  fully  ab  initio  (no  SFM).  The  non-bonded  terms  use  the 
eombined  EFP/ah  initio  approximation  (cutoff  >2.7A)  or  ab  initio  (given  in  parentheses). 


2nd  Order 
HF/6-31++G(d,p) 

3rd  Order 
HF/6-31++G(d,p) 

Isomer 

HF  kcal/mol 

HF  kcal/mol 

MAQWUW_1  -MAQWUW_2 

2.7  (1.1) 

1.7  (0.2) 

WUYCUO-WUYDAV 

5.7  (5.8) 

3.9  (6.1) 

WUYCUO-WUYDEX 

0.9  (2.5) 

0.1  (3.1) 

YETPES_1-YETPES_2 

1.1  (5.3) 

0.2  (1.0) 

MAE 

2.6  (3.7) 

1.5  (2.6) 
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Table  4.  Net  CPU  times  (minutes)  for  the  SFM  HF/6-31++G(d,p)  on  a  single  eore  of  a 
Xeon  2.66Ghz  quad  eore  Cloverton  node,  with  16  GB  RAM.  Net  times  include  the  time 
needed  for  EFP  generation.  The  EFP  generation  time  is  given  in  parentheses.  The  total 
number  of  non-bonded  terms  is  also  listed.  The  heading  EEP  indicates  the  use  of  EEP  for 
the  non-bonded  terms. 


Isomer 

2nd 

#  non-bonded 
terms 

Order  Non-bonded 

EFP 

No  EFP 

3rd 

#  non-bonded 
terms 

Order  Non-bonded 

EFP 

No  EFP 

MAQWUW_1 

1113 

128  (25) 

144 

3159 

329  (191) 

559 

MAQWUW_2 

1113 

128  (26) 

140 

3155 

333  (194) 

562 

WUYCUO 

1752 

155  (33) 

182 

5049 

413  (214) 

853 

WUYDAV 

1754 

162  (33) 

188 

5059 

439  (227) 

909 

WUYDEX 

1754 

150  (32) 

180 

5052 

419  (217) 

880 

YETPES_1 

932 

115  (24) 

122 

2623 

305  (170) 

490 

YETPES_2 

929 

117  (25) 

126 

2618 

299  (166) 

497 
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Table  5.  Comparison  of  the  number  of  ab  initio  non-bonded  terms  needed  for  non- 
bonded  EFP/ab  initio  eutoffs  set  to  2.7  and  4.5A  at  the  2"^^  and  order  many  body 
approximation. 


2'"^  Order  Many  Body  3'“^  Order  Many  Body 

2.1  A  (terms)  4.5A  (terms)  2.1  A  (terms)  4.5A  (terms) 


MAQWUW_1 

34 

233 

113 

729 

MAQWUW_2 

34 

224 

106 

693 

WUYCUO 

35 

318 

108 

1054 

WUYDAV 

40 

327 

130 

1075 

WUYDEX 

36 

321 

118 

1055 

YETPES_1 

29 

225 

79 

709 

YETPES_2 

25 

225 

70 

708 
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Table  6.  Absolute  errors  in  the  FM02-MP2  and  FM03-MP2  total  energy  of  the  32  water 
elusters  seleeted  from  EFP  Monte  Carlo/Simulated  Annealing  simulations.  Isomer  names 
are  only  used  to  distinguish  isomers  from  one  another. 


Absolute  Error  (kcal/mol) 

6-31++G(d,p) _ 6-311++G(3df,2p) _ 

Isomer"*  FM02-MP2  FM03-MP2  FM02-MP2  FM03-MP2 


32_1 

11.8 

2.2 

26.8 

1.0 

32_2 

12.4 

2.5 

28.0 

1.2 

32_3 

11.4 

1.9 

27.3 

1.3 

32ab 

12.5 

2.5 

27.4 

1.3 

32ad 

11.8 

2.5 

27.3 

1.2 

32h 

12.3 

2.5 

27.3 

1.3 

32o 

12.5 

2.5 

25.8 

1.0 

32z 

12.4 

2.3 

24.6 

1.2 

“One  water  molecule  chosen  as  a  monomer 
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Table  7.  Relative  FM02-MP2  and  FM03-MP2  energies  of  the  32  water  clusters  selected 
from  EFP  Monte  Carlo/Simulated  Annealing  simulations.  Isomer  names  are  only  used  to 
distinguish  isomers  from  one  another. 


Relative  Energies  (kcal/mol) 
6-31++G(d,p) 

6-311++G(3df,2p) 

Isomer“ 

FM02-MP2 

FM03-MP2 

ab  initio 

FM02-MP2  FM03-MP2 

ab  initio 

32_1 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

32z 

1.3 

0.5 

0.2 

0.7 

0.2 

0.1 

32_2 

1.4 

1.2 

0.9 

1.1 

0.8 

0.5 

32ab 

1.5 

1.2 

0.9 

1.1 

0.9 

0.6 

32h 

1.4 

1.2 

0.9 

1.4 

1.0 

0.7 

32o 

1.7 

1.5 

1.2 

1.7 

1.3 

1.0 

32ad 

4.9 

6.0 

6.0 

5.7 

6.1 

5.8 

32  3 

11.8 

14.3 

14.1 

14.0 

14.1 

14.4 

“One  water  molecule  chosen  as  a  monomer 
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Table  8.  FM02  errors  (kcal/mol)  for  tetramer  and  hexamer  ionic  liquid  clusters. 


Absolute  Error  (kcal/mol) 
6-31+G(d) 

T  etramers 

FM02-MP2 

FM03-MP2 

1  -H,4-H- 1 ,2,4-triazolium  dinitramide 

0.06 

0.02 

1  -amino,4-H- 1 ,2,4-triazolium  dinitramide 
Hexamers 

0.69 

0.03 

1  -H,4-H- 1 ,2,4-triazolium  dinitramide 

0.32 

0.07 

1  -amino,4-H- 1 ,2,4-triazolium  dinitramide 

1.35 

0.27 
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Table  9.  Comparison  of  Mulliken  charges  for  all  ionic  liquid  systems  investigated. 


Tetramers 

Mulliken  Charges 
6-31+G(d) 
FM02-MP2 

FM03-MP2 

MP2 

1  -H,4-H- 1 ,2,4-triazolium  dinitramide 

Cation  1 

0.82 

0.77 

0.74 

Cation  2 

0.82 

0.77 

0.74 

Anion  1 

-0.82 

-0.77 

-0.74 

Anion  2 

-0.82 

-0.77 

-0.74 

1  -amino, 4-H-l ,2,4-triazolium  dinitramide 

Cation  1 

0.87 

0.84 

0.82 

Cation  2 

0.82 

0.82 

0.82 

Anion  1 

-0.89 

-0.94 

-0.93 

Hexamers 

Anion  2 

-0.80 

-0.74 

-0.71 

1  -H,4-H- 1 ,2,4-triazolium  dinitramide 

Cation  1 

0.86 

0.82 

0.79 

Cation  2 

0.88 

0.85 

0.80 

Cation  3 

0.94 

0.91 

0.87 

Anion  1 

-0.83 

-0.79 

-0.77 

Anion  2 

-0.95 

-0.92 

-0.88 

Anion  3 

-0.90 

-0.86 

-0.81 

1  -amino,4-H- 1 ,2,4-triazolium  dinitramide 

Cation  1 

0.84 

0.84 

0.88 

Cation  2 

0.79 

0.78 

0.76 

Cation  3 

0.90 

0.89 

0.89 

Anion  1 

-0.81 

-0.92 

-0.95 

Anion  2 

-0.83 

-0.85 

-0.83 

Anion  3 

-0.88 

-0.74 

-0.75 
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Table  10.  Timings  for  ionic  liquid  clusters  performed  on  a  Cray  XT4  with  2.1GHz  AMD 
Opteron64  proeessors.  Eaeh  node  eontains  a  4  eore  CPU  and  8  GB  of  RAM. 


Tetramer 

#CPUs 

Timing  (minutes) 

6-31+G(d) 

FM02-MP2 

MP2 

1  -amino, 4-H- 1 ,2,4-triazolium  dinitramide 

8 

12.2 

28.4 

16 

6.4 

14.7 

32 

3.5 

7.3 

Hexamer 

1  -amino, 4-H- 1 ,2,4-triazolium  dinitramide 

8 

24.0 

172.1 

16 

12.5 

83.9 

32 

6.8 

42.8 

53 


Scheme  1.  Depiction  of  isomers  used  in  Table  2.  Structures  are  from  the  Cambridge 
Structural  Database  (CSD).  Non-hydrogen  atoms  have  been  labeled. 
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Scheme  2.  Depiction  of  alpha  helix  isomers  used  in  Table  4.  Structures  are  from  the 
Cambridge  Structural  Database  (CSD).  Non-hydrogen  atoms  have  been  labeled. 
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FIGURE  CAPTIONS 


Figure  I.  Sandwich,  T-shaped,  and  parallel-displaced  configurations  of  the  benzene 
dimer. 

Figure  2.  Comparison  of  SAPT  and  EFP  interaction  energy  (kcal/mol)  decomposition  as 
a  function  of  the  seperation  (A)  of  benzene  dimers  in  the  sandwhich  configuration. 

Figure  3.  Comparison  of  SAPT  and  EFP  interaction  energy  (kcahmol)  decomposition  as 
a  function  of  the  seperation  R  (A)  of  benzene  dimers  in  the  T-shaped  configuration. 

Figure  4.  Electrostatic  fractioning  of  bonds. 

Figure  5.  Monomer  calculation  performed  in  the  ESP  of  the  full  system. 

Figure  6.  Illustration  of  FMO  approximations  applied  to  three  monomers  I,  J,  E  (left)  and 
as  applied  to  dimer  IJ  and  monomer  E  (right). 

Figure  7.  Eowest  energy  cluster  of  32  water  molecules  obtained  from  EFP  Monte 
Carlo/Simulated  Annealing  simulations. 

Figure  8.  Ion  pairs  of  I-amino,4-H-I,2,4-triazolium  dinitramide  (top)  and  I-H,4-H-I,2,4- 
triazolium  dinitramide  (bottom). 

Figure  9.  Eowest  energy  structure  of  I-amino,4-H-I,2,4-triazolium  dinitramide  tetramer 
obtained  from  an  ab  initio  MP2/6-3I+G(d)  optimization. 

Figure  10.  Eowest  energy  structure  of  I-amino,4-H-I,2,4-triazolium  dinitramide 
hexamer  obtained  from  an  ab  initio  MP2/6-3I+G(d)  optimization. 

Figure  II.  Eowest  energy  structure  of  I-H,4-H-I,2,4-triazolium  dinitramide  tetramer 
obtained  from  an  ab  initio  MP2/6-3I+G(d)  optimization. 

Figure  12.  Eowest  energy  structure  of  I-H,4-H-I,2,4-triazolium  dinitramide  hexamer 
obtained  from  an  ab  initio  MP2/6-3I+G(d)  optimization. 
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»  EFP:  electrostatic 
— D—  SAPT:  electrostatic 
— “ — EFP:  exchange 
— A —  SAPT:  exchange 
•  EFP:  induction 
— o—  SAPT:  induction 
»  EFP:  dispersion 
— o—  SAPT:  dispersion 
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I,  J  and  L  all  as 
separate  monomers 


I  and  J  composing  dimer  IJ 
L  as  a  monomer 


Monomer  I  approximation  cut-off 
Monomer  J  approximation  cut-off 
Monomer  L  approximation  cut-off 
Dimer  IJ 

Dimer  IJ  approximation  cut-off 


68 


69 


70 


1 


71 


3 


72 


73 


74 


